# Multimodal Conversion
Index Anisora 5B Diffusers
Apache-2.0
An image-to-video generation model implemented with Diffusers, with a parameter scale of 5B
Text-to-Video
I
Disty0
82
1
Hunyuanvideo I2V
Tencent's HunyuanVideo-I2V is a Diffusers-based image-to-video model capable of converting static images into dynamic videos.
Image-to-Text
H
hunyuanvideo-community
496
2
Minicpm O 2 6 GGUF
MiniCPM-o-2_6 is a multimodal conversion model supporting multiple languages and suitable for various tasks.
Text-to-Image Other
M
second-state
506
6
Rexseek 3B
Other
This is an image-to-text conversion model capable of processing both image and text inputs to generate corresponding text outputs.
Text-to-Image
Transformers

R
IDEA-Research
186
4
Vit GPT2 Image Captioning Model
An image caption generation model based on the ViT-GPT2 architecture, capable of converting input images into descriptive text
Image-to-Text
Transformers

V
motheecreator
142
0
Vchitect 2.0 2B
Apache-2.0
Vchitect-2.0 is a parallel Transformer model for scaling video diffusion models, specializing in text-to-video and image-to-video generation tasks.
Video Processing
V
Vchitect
50
38
Image Model
This is a transformers-based image-to-text conversion model, specific functionalities require further details
Image-to-Text
Transformers

I
Mouwiya
15
0
4M 7 SR L CC12M
Other
4M is a scalable multimodal masked modeling framework that supports any-to-any modality conversion, covering dozens of modalities and tasks.
Multimodal Fusion
4
EPFL-VILAB
26
2
Hashtaggenerater
Flickr30k is an English dataset for image-to-text tasks, commonly used for training and evaluating image caption generation models.
Image-to-Text
Transformers English

H
kusumakar
24
2
Featured Recommended AI Models